[SNU EE AI Seminar] Convex Optimization for Machine Learning (ML)

Abstract

This presentation explores the fundamental role of convex optimization in machine learning, emphasizing why convex optimization represents one of the few classes of optimization problems that can be reliably and efficiently solved. The talk establishes that many machine learning algorithms inherently depend on convex optimization techniques, from classical methods like least squares and linear programming to more advanced approaches including semidefinite programming and max-det problems. By demonstrating that convex optimization problems guarantee global optimality when local minima are found, the presentation highlights the mathematical elegance and practical advantages of formulating machine learning problems within the convex optimization framework.

The core content bridges the gap between optimization theory and practical machine learning applications through detailed examination of key algorithms. Linear regression is presented as a foundational least squares problem with closed-form solutions, while extensions to Ridge regression and Lasso demonstrate how regularization techniques can be seamlessly incorporated while maintaining convexity. Support vector machines are explored both in their basic formulation and with kernel transformations, illustrating how complex classification problems can be efficiently solved using convex optimization principles. The presentation also covers crucial theoretical concepts including Lagrangian duality, KKT conditions, and complementary slackness, providing the mathematical foundation that enables practical algorithm implementation.

Multiple perspectives on machine learning are synthesized to provide a comprehensive understanding of the field’s interdisciplinary nature. The statistical viewpoint connects maximum likelihood estimation to Kullback-Leibler divergence minimization and demonstrates the equivalence between MLE and mean square error optimization under Gaussian assumptions. Computer science perspectives address neural network architectures and hyperparameter optimization, while numerical algorithmic approaches focus on gradient-based methods including stochastic gradient descent and adaptive algorithms like Adam and RMSProp. This unified framework demonstrates how convex optimization theory provides the mathematical foundation for understanding, analyzing, and developing robust machine learning algorithms across diverse application domains.

Sunghee Yun (He/Him)

Abstract